Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data

نویسندگان

  • Jason Piper
  • Markus C. Elze
  • Pierre Cauchy
  • Peter N. Cockerill
  • Constanze Bonifer
  • Sascha Ott
چکیده

The expression of eukaryotic genes is regulated by cis-regulatory elements such as promoters and enhancers, which bind sequence-specific DNA-binding proteins. One of the great challenges in the gene regulation field is to characterise these elements. This involves the identification of transcription factor (TF) binding sites within regulatory elements that are occupied in a defined regulatory context. Digestion with DNase and the subsequent analysis of regions protected from cleavage (DNase footprinting) has for many years been used to identify specific binding sites occupied by TFs at individual cis-elements with high resolution. This methodology has recently been adapted for high-throughput sequencing (DNase-seq). In this study, we describe an imbalance in the DNA strand-specific alignment information of DNase-seq data surrounding protein-DNA interactions that allows accurate prediction of occupied TF binding sites. Our study introduces a novel algorithm, Wellington, which considers the imbalance in this strand-specific information to efficiently identify DNA footprints. This algorithm significantly enhances specificity by reducing the proportion of false positives and requires significantly fewer predictions than previously reported methods to recapitulate an equal amount of ChIP-seq data. We also provide an open-source software package, pyDNase, which implements the Wellington algorithm to interface with DNase-seq data and expedite analyses.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corrigendum: Comparative evaluation of DNase-seq footprint identification strategies

DNase I is an enzyme preferentially cleaving DNA in highly accessible regions. Recently, Next-Generation Sequencing has been applied to DNase I assays (DNase-seq) to obtain genome-wide maps of these accessible chromatin regions. With high-depth sequencing, DNase I cleavage sites can be identified with base-pair resolution, revealing the presence of protected regions ("footprints"), correspondin...

متن کامل

Explicit DNase sequence bias modeling enables high-resolution transcription factor footprint detection

DNaseI footprinting is an established assay for identifying transcription factor (TF)-DNA interactions with single base pair resolution. High-throughput DNase-seq assays have recently been used to detect in vivo DNase footprints across the genome. Multiple computational approaches have been developed to identify DNase-seq footprints as predictors of TF binding. However, recent studies have poin...

متن کامل

DNaseR: DNase I footprinting analysis of DNase-seq data

The combination of DNase I digestion and high-throughput sequencing (DNaseseq) has been used recently to map chromatin accessibility in a given tissue or cell type on a genome-wide scale (Song and Crawford, 2010). In addition to DNase I hypersensitive sites (DHSs), short regions of protected nucleotides known as footprints can be detected using a technique known as ”digital genomic footprinting...

متن کامل

Most brain disease-associated and eQTL haplotypes are not located within transcription factor DNase-seq footprints in brain

Dense genotyping approaches have revealed much about the genetic architecture both of gene expression and disease susceptibility. However, assigning causality to genetic variants associated with a transcriptomic or phenotypic trait presents a far greater challenge. The development of epigenomic resources by ENCODE, the Epigenomic Roadmap and others has led to strategies that seek to infer the l...

متن کامل

On Accounting for Sequence-Specific Bias in Genome-Wide Chromatin Accessibility Experiments: Recent Advances and Contradictions

Uncovering the protein–DNA interactions involved in cell fate, development, and disease in a timeand cell-specific manner is a fundamental goal of molecular biology. The advent of the sequencing technologies has opened a new genomic era, uncovering the information encoded in genomes, epigenomes, and transcriptomes (McPherson, 2014). For example, the popular ChIPbased techniques ChIP-seq (Johnso...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 41  شماره 

صفحات  -

تاریخ انتشار 2013